Add timeout error classification #590

SamRemis · 2025-10-28T17:05:10Z

This pull request centralizes and clarifies timeout handling across the async client and transport layers by adding a transport-level error classification API and surfacing transport-detected timeouts as a dedicated ClientTimeoutError. Transports now return an ErrorInfo that indicates whether an exception represents a timeout and whether the fault is client- or server-side, and the core async client consults that information and raises ClientTimeoutError when appropriate so callers see a single, consistent exception for client-side timeouts.

ClientTransport implementations must now implement get_error_info(exception, **kwargs) and return an ErrorInfo indicating whether the exception is a timeout and whether the fault is client- or server-side. The break was required so the core async client can reliably classify transport errors and raise a single ClientTimeoutError for client-side timeouts. This information is necessary for handling errors as a part of retries.

packages/smithy-http/src/smithy_http/aio/protocols.py

packages/smithy-http/src/smithy_http/aio/crt.py

packages/smithy-core/src/smithy_core/aio/client.py

packages/smithy-http/src/smithy_http/aio/crt.py

jonathan343

I need to take a deeper look, but I don't think we're catching the new exception somewhere. I hacked around the CRT client to set the connect_timeout_ms to 10. The result is the SDK will keep trying to send the request indefinitely. When debugging I see the response is:

Client timeout occurred: AWS_IO_SOCKET_TIMEOUT: socket operation timed out.

Update: This might be specific to the simple retry mode not being able to handle this. I need a bit more investigation time to get a more clean answer though

jonathan343 · 2025-11-07T15:48:03Z

packages/smithy-core/src/smithy_core/aio/interfaces/__init__.py

+    exceptions represent timeout conditions for that transport.
+    """
+
+    def get_error_info(self, exception: Exception, **kwargs: Any) -> ClientErrorInfo:


I'm a bit hesitant to make this a required piece of ClientTransport. This is breaking for existing versions of smithy_http since clients don't implement this. We need to do one of the following:

Make this optional and handle it gracefully

Include a breaking changelog entry so we know to version bump properly in the next release

I think I prefer the first option

I don't know if I understand this conceptually. Transport is specifically an abstraction that takes a Request, "sends" it to an external entity or process, and then returns whatever response we get.

How is ErrorClassifyingTransport doing that? This seems like it's a post-processing step on a received response.

The rationale behind the current implementation is that each client has its own unique set of exceptions that will be raised in the case of a timeout. These need to be caught and flagged as timeout exceptions so that our retry handlers can adjust the retry behavior appropriately.

Since the errors are specific to implementations of the HTTP client, the exception handling logic needs to live on the client itself or find some other way of associating its related error info. The get_error_info method will allow us to extend this in the future to add more information than just if a given exception is a timeout, but a simpler alternative would be adding an is_timeout_error(exception) method directly to the ClientTransport protocol.

Unfortunately, protocols in python don't offer a way of marking a method as optional. We'd either need a default implementation that returns False, or use runtime checks like the current hasattr(client, 'get_error_info').

I'm open to refactoring the approach if there's a preference between the two suggestions above or alternative idea.

The rationale behind the current implementation is that each client has its own unique set of exceptions that will be raised in the case of a timeout. These need to be caught and flagged as timeout exceptions so that our retry handlers can adjust the retry behavior appropriately.

I think we can accomplish this a few ways without needing to add this new concept. I think the original proposal for this was adding TIMEOUT_ERRORS and potentially other FOO_ERRORS constants onto each transport and making those part of the Protocol. Then we have a static location to look for this information across all transports and can write a single generalized function for error handling. Did we encounter issues with that approach?

Another option is to have the client's send method do that classification for us with its own error handling that ensures the subset of errors that should be classified as Timeouts are raised with an error that subclasses a general TimeoutError.

TIMEOUT_ERRORS doesn't allow any sort of introspection about the contents of the error; it only would have information about the class. For something like the CRT where all errors are instances of AwsCrtError but have varying name properties, this would have no way of determining if it was actually a TimeoutError or not.

We could ask the CRT to start raising more helpful errors, but allowing a ClientTransport to do further introspection makes sense to me from an extensibility standpoint.

Considering this, I'll go with the second option and add the logic to the client's send method if no one has any objections. Thank you for the suggestions.

packages/smithy-http/tests/unit/aio/test_protocols.py

Add error classification

1c8bed7

SamRemis requested a review from a team as a code owner October 28, 2025 17:05

SamRemis added 2 commits October 28, 2025 13:41

pyright updates

06ece22

Add new required method to codegen

264f02a

jonathan343 reviewed Nov 4, 2025

View reviewed changes

SamRemis added 4 commits November 5, 2025 12:09

Updates from feedback

f35ddd1

Remove fault from ClientErrorInfo

92dd38f

Remove outdated/incorrect docstrings

ecb05aa

CI fixes

9e37c52

jonathan343 reviewed Nov 6, 2025

View reviewed changes

jonathan343 reviewed Nov 7, 2025

View reviewed changes

alexgromero reviewed Nov 11, 2025

View reviewed changes

packages/smithy-http/tests/unit/aio/test_protocols.py Outdated Show resolved Hide resolved

Make get_error_info optional

70100ee

alexgromero mentioned this pull request Nov 12, 2025

Add standard retry mode #545

Open

Updates based on feedback

39dfa0f

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add timeout error classification #590

Add timeout error classification #590

Uh oh!

SamRemis commented Oct 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan343 left a comment •

edited

Loading

Uh oh!

jonathan343 Nov 7, 2025

Uh oh!

nateprewitt Nov 12, 2025

Uh oh!

SamRemis Nov 12, 2025 •

edited

Loading

Uh oh!

nateprewitt Nov 12, 2025

Uh oh!

SamRemis Nov 12, 2025

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

Add timeout error classification #590

Are you sure you want to change the base?

Add timeout error classification #590

Uh oh!

Conversation

SamRemis commented Oct 28, 2025

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

jonathan343 left a comment • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

jonathan343 Nov 7, 2025

Choose a reason for hiding this comment

Uh oh!

nateprewitt Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

SamRemis Nov 12, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

nateprewitt Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

SamRemis Nov 12, 2025

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

4 participants

jonathan343 left a comment •

edited

Loading

SamRemis Nov 12, 2025 •

edited

Loading